P2P Web Search: Make It Light, Make It Fly (Demo)
نویسندگان
چکیده
We propose a live demonstration of MinervaLight, a P2P Web search engine. MinervaLight combines the (previously separate) focused crawler BINGO! (to harvest Web data), the local search engine TopX, and our P2P Web search system MINERVA under one common user interface. The crawler unattendedly downloads and indexes Web data, where the scope of the focused crawl can be tailored to the thematic interest profile of the user. The result of this process is a local search index, which is used by TopX to evaluate user queries. In the background, MinervaLight continuously computes compact statistical synopses that describe a user’s local search index and publishes that information to a conceptually global, but physically fully decentralized directory. MinervaLight offers a search interface where users can submit queries to MINERVA. Sophisticated query routing strategies are used to identify the most promising peers for each query based on the statistical synopses in the directory. The query is forwarded to those judiciously chosen peers and evaluated based on their local indexes. These results are sent back to the query initiator and merged into a single result list. We give a live demonstration of the fully functional system.
منابع مشابه
Bridging the P2P and WWW Divide with DISCOVIR - DIStributed COntent-based Visual Information Retrieval
In the light of image retrieval evolving from text annotation to content-based and from standalone applications to web-based search engines, we foresee the need for deploying content-based image retrieval (CBIR) into Peer-to-Peer (P2P) architecture. By doing so, we not only distribute the tasks of feature extraction, indexing and storage of image data into peers, we also introduce another aspec...
متن کاملPortable Desktop Applications Based on P2P Transportation and Virtualization
Play-on-demand is usually regarded as a feasible access mode for web content (including streaming video, web pages and so on), web services and some Software-As-A-Service (SaaS) applications, but not for common desktop applications. This paper presents such a solution for Windows desktop-applications based on lightweight virtualization and network transportation technologies which allows a user...
متن کاملImage flip CAPTCHA
The massive and automated access to Web resources through robots has made it essential for Web service providers to make some conclusion about whether the "user" is a human or a robot. A Human Interaction Proof (HIP) like Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) offers a way to make such a distinction. CAPTCHA is a reverse Turing test used by Web serv...
متن کاملDocument Clustering for Distributed Fulltext Search
Recent research efforts in peer-to-peer (P2P) systems concentrate on providing a “distributed hash table”-like primitive in the P2P system (Stoica et al., 2001). However, to make P2P systems useful, we need to build a keyword search engine to index the entire document collection in the distributed system. Doing keyword search in a distributed environment poses new challenges for traditional inf...
متن کاملTowards large scale peer-to-peer web search
Web search engines, such as Google and Yahoo, are based on the centralized database model. Search engines using the centralized database model suffer from a several drawbacks, such as: they have a single point of failure, a limited representation of the web, their index is not up-to-date, and scalability. Currently a lot of research is being done on using peer-to-peer (P2P) technology for the u...
متن کامل